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(54) Method of virtual retrieval of analogs of lead compounds by constituting potential libraries 

(57) The method to rapidly retrieve potentially active 
analogs of lead compounds according to the invention 
generates and screens from a large database of 3D 
multiconformational fingerprints of chemically feasible 
combinatorial products mainly by modeling the radicals 
by linking them temporarily to a bulky spacekeeper 
group, registering such 30 models of the radicals in a 
combinatorial ghost database, for any molecular struc- 
ture that is accessible within the ghost database, detect- 
ing any atom that displays physical property features of 
the pharmacophore type: for the pharmacophores 
(BPs) detected in each molecular structure, calculating 
all the distances between the involved atoms in every 
conformation of this molecule and creating a BP dis- 
tance distribution density; generating a conformational 
fingerprint vector that contains all the distance distribu- 
tion densities of the pharmacophores (BPs); defining a 
scoring function for each molecular fingerprint account- 
ing for the relative importance of the pharmacophore 
features; and comparing these fingerprints to each fin- 
gerprint of the potential library according to the above 
scoring function as maximized for the lead, and retriev- 
ing the molecules of the potential library for which the 
scoring function gives score values less than a specified 
threshold. 
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Description 

[0001 ] The invention relates to the combinatorial chemistry, and methods to screen and retrieve molecules from librar- 
ies that are structurally related with a lead compound to find synthetic analogs, e.g. analogs with similar biological or 
physical properties. 

[0002] Virtual screenings, which evaluate similarity scores between each compound of a database and the reference 
or lead compound and retrieve such molecules, are already known. In particular, prediction of structures of active ana- 
logs, starting from a learning set of compounds of known biological activities is a research field in full development, 
where many different approaches have been reported and tested, such as the simulation of the "docking- of a ligand in 
a receptor site (Ajay & Murcko. J.Med.Chem.. 388, pp. 681. 1995) or the free energy perturbation approaches (Koll- 
mann. Chemical Revue 1993. pp. 2395. 1995), or the screening 3D approaches (Tripos Technical Notes. Vol. 1, Nr.2 - 
Molecular Diversity Manager, October 1995, as well as the program Cerius2 Drug Discovery Workbench from "MSI" 
Inc. Molecular Simulations Incorporated). 

[0003] However, the instant problem to be solved is the retrieval of similar, potentially active analogs for molecules of 
known biological activity out of the large collection of hundreds of millions of combinatorial products, synthetised on 
hand of building blocks of a reference library and of available chemical know-how. in order to select a biased sublibrary 
having a maximum content in active compounds and which can be synthetised and tested with respect to the presumed 
biological activity. 

[0004] Prior art approaches are either too time-consuming or non-realistic in order to provide both fast and accurate 
retrieval of active analogs that are chemically feasible. 

[0005] Recently published 2D approaches can be used to describe large libraries of molecules in term of connectivity 
descriptors, such as the issue by Higgs. Bemis, Watson & Wikel in J.Chem.lnf.Comp. Sci. Vol 37 n°5. pp. 861, 1997. 
However they donot account for geometrial and conformational aspects. Besides, approaches which analyze the 
molecular connectivity of large sets of candidates, are often flowed by their lack of realism in the description of the mol- 
ecules. 

[0006] More realistic approaches, like Tripos's approach, called ( (COMFA ) ) (Comparative Molecular Fields Analysis), 
require an unambiguous superposition of the compared molecules, i.e. only fairly similar compounds (compounds hav- 
ing a common "template" or skeleton) can be meaningfully compared to each other. Furthermore, a great uncertainty 
remains on whether the calculated superposition mode of the compared molecules is physically relevant with respect 
to the binding mode to a receptor. 

[0007] Furthermore, most of the prior art approaches perform retrieval of active analogs out of more or less random 
collections of products which oftenly leads to situations where the retrieved molecules are not chemically synthetisable. 
unstable or generally inadequate for use as drugs. 

[0008] The present invention aims to solve these problems and propose a new approach based on an optimized 
trade-off between the degree of realism of the description of the molecules and the rapidity of retrieving them, by gen- 
erating potential libraries that encode 3D multiconfbrmational information under the form of pharmacophoric finger- 
prints of combinatorial products, and screening of these libraries by using scoring functions with a number of 
parameters specifically chosen. 

[0009] More precisely, the object of the present invention is a method to rapidly retrieve potentially active analogs of 
lead compounds, wherein 3D fingerprints are generated and screened from a large database of 3D multiconformational 
fingerprints of chemically feasible combinatorial products as follows: 

- selecting systematically by a chemical filter algorithm, according to elementary chemical rules, couples of radicals 
each including a reactive center, as valid reaction partners, from Building Blocks (BBs) constituents of a reference 
library, the reactive centers being detected by analysing the molecular connectivity and the leaving groups being 
deleted: 

- modeling the radicals by linking them temporarily to a bulky spacekeeper group and submitting the resulting com- 
plex to conformational sampling run yielding a collection of confbrmers. then removing the spacekeeper in order to 
ensure that the reactive centers of the radicals are sterically accessible and its free valency points towards a region 
of free space previously occupied by the spacekeeper; 

- registering such 3D models of the radicals in a combinatorial ghost database, after verifying that these models 
comply with the coupling ability according to sterical hindrance and conformational diversity criteria of the ghost 
database, this ghost database emulating a database of combinatorial products in instantly generating the structure 
of any such product by linking together the registered structures of the constituting radicals; 

- for any molecular structure that is accessible within the ghost database, detecting any atom that displays physical 
property features of the pharmacophoric type, involved in determining the intensity of intermolecular interactions, 
at least the hydrophobic, aromatic, hydrogen bond donor, hydrogen bond acceptor, anionic or cationic characters, 
on the basis of elementary rules accounting for the chemical nature of such an atom and the molecular environ- 
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ment in which h is placed, and listing all the possible bipolar pharmacophores pairs (BP) in which each atom dis- 
plays a given pharmacophore feature; 

- for the pharmacophores (BPs) detected in each molecular structure, calculating all the distances between the 
involved atoms in every conformation of this molecule and creating a BP distance distribution density, by fuzzy 

s monitoring of the number of BPs for which the associated distances fall within given ranges; 

generating a conformational fingerprint vector that contains all the distance distribution densities of the pharma- 
cophores (BPs) associated to a current conformation, obtaining an average molecular fingerprint from the confor- 
mational fingerprints of the considered conformers to constitute and supply a potential library, 

- defining a scoring function for each molecular fingerprint accounting tor the relative importance of the pharmacoph- 
io oric features owings to weighting factors that are calibrated in order to maximize a discriminative power with respect 

to different binding affinities; and 

generating the fingerprints of the lead for which analogs are to be retrieved, comparing these fingerprints to each 
fingerprint of the potential library according to the above scoring function as maximized for the lead, and retrieving 
the molecules of the potential library for which the scoring function gives score values less than a specified thresh- 
es old. 

[001 0] In particular embodiments, the weighting factors of the scoring function are calibrated in order to maximize the 
discriminative power between families of ligands of different receptors in a so-called ( General Diversity paradigm > ), or 
to maximize the discriminative power between the compounds that bind to a given receptor, in contrast to those that 

20 have no binding affinity with respect to it, in a so-called ( (Receptor-Oriented Diversity paradigm) ). 

[001 1 ] In a prefer ed embodiment, the method of the present invention includes, at the stage of generating the poten- 
tial library, preliminary checking steps in order to discard the pairs of building blocks that can not be used as partners 
in any of the available synthesis protocols, either due to the absence of appropriate reactive groups or due to the pres- 
ence of potentially interfering groups that may trigger unwanted side reactions, in order to prevent that the generated 

25 potential library does not contain compounds which could be formally represented as the coupling products of two BBs. 
but which for chemical reasons cannot be obtained in that way. 

[0012) Therefore, all the molecules composing the "potential library" are actually syrrthetisable and represent phar- 
macologically acceptable species (without "exotic", very reactive or unstable groups), with the direct consequence that 
any of analogs retrieved by the virtual screening of the invention can be synthetised with little effort. 
30 [0013] The method according to the present invention allows to design biased libraries that include the herein 
retrieved analogs, synthetizing these biased libraries and evaluating the activities of their products. Furthermore, the 
herein generated structure-activity data can be used in order to improve the parametrization of the scoring function or 
to initiate new predictive approaches such as neural networks. able to estimate the required activity of the molecule on 
the basis if its fingerprint 

35 [001 4] There are several advantages of present approach, in contrast to other drug design strategies and information 
management schemes in combinatorial chemistry 

[001 5] The method according to the present invention can be integrated to a discovery paradigm that does no longer 
need a primary, "blind" screening of a compound library, if at least one ligand structure is known for the studied receptor. 
[0016] Such paradigm includes the steps of generating and updating the potential library of fingerprints of combina- 

40 torial compounds, retrieval of potentially active analogs of known ligands out of this potential library, on the basis of the 
< General Diversity >> paradigm, and design of a biased sublibrary, synthesis of the biased library and identification of 
active compounds, training and adjusting the parameters in order to define the < (Receptor-Oriented Diversity )) scoring 
function, or to obtain a predictive neural network, retrieval and synthesis of other potentially active according to the pre- 
viously calibrated scoring function or neural network. 

45 [001 7] The potential library is directly linked to the building block stock databases, and updated in function of the 
changes in available building blocks and validated chemistries; this is a net advantage over the concept of virtual librar- 
ies which contain more or less random selections of compounds that may or may not be chemically feasible and/or 
pharmacologically interesting, 

[0018] The build-up of the fingerprints stored in the potential library takes profit of the extremely fast access to the 
so mutticortformational models of the combinatorial products from the combinatorial ghost database, which precludes the 
need of an explicit generation of three-dimensional for the up to 100 million product molecules; using one of the fastest 
conformational sampling programs, such the software "Catalyst" of MSI (Molecular Simulations Incorporation) that is 
claimed to process up to 1 0.000 compounds per 24 CPU hours, largely more than 1 000 days would be required to com- 
plete such a task. 

55 [001 9] The combinatorial ghost database offers immediate access to the multiconformational models of any combi- 
natorial product and instantly generates them by linking together the registered conformers of the radicals that consti- 
tute this product, and performing a 2 or 3-step torsional angle driving around the newly formed bond. The obtained 
conformers of the product are free of interatomic clashes, due to the precautions taken when modeling the radicals prior 
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to their registration .in the ghost database. 

[0020] An explicit check of the quality of the geometries obtained by the coupling is nevertheless performed, in spite 
of those precautions. This operation is hardly more time-consuming than the input of molecular data files of the product 
: access to the structures of (MxN) combinatorial products of a combined type that can be obtained out of M BBs of a 
5 first type and N BBs of a second type is gained at the cost of a conformational sampling effort required to obtain the 30- 
models of the (M+N) BBs. 

[0021 J The generated fingerprints represent distance distribution densities between pairs of atoms matching a pair of 
given pharmacophoric features. The rule-based identification of the atoms displaying given pharmacophore features 
being very fast, the generation of the fingerprints of 100 million compounds would take some tens days depending on 
w the processor operating system and its peripheral, for instance an estimated 20 to 30 days on a Unix workstation. They 
can be used to describe individual conformations, molecules, as well as collections of molecules within a unified formal- 
ism. Histograms corresponding to these fingreprints can be straightforwardly plotted and interpreted. 
[0022] Other advantages and features of the present invention will be disclosed in the hereafter detailed description 
of non imitative embodiment in reference with the annexed figures which respectively show: 

Figure 1 . a schematic flowchart of a first steps series of an exemplified method according to the invention and rel- 
ative to a chemical algorithmic filter to select couples of radicals; 

Figure 2. a schematic flowchart of the following step series of such a method relative to a geometrical algorithmic 
filter to provide and register 3-D models of the radicals in a ghost data base; 

20 

[0023] The construction of potential libraries according to the invention relies on the collection of building blocks, ref- 
ered as BB, currently available from a library and a set of synthesis protocols, both of which are regularly updated. Each 
such update automatically triggers the update of the potential libraries. Adatabase containig the molecular 2D-sketches 
of BB, furnishes a description of the molecular connectivity of the BB. 

25 [0024] Each synthesis protocol preferably requires, in the initial BB molecules, the presence of appropriate functional 
groups and the absence of potentially interferring reactive groups which may lead to side reactions. A preliminary algo- 
rithm selects its required molecules in respect of their chemical compatibility defined in each "reactivity profile" depend- 
ing on the considered synthesis protocol, the chemical properties of the ligands of the receptor or the binding ability to 
the receptor. Such an algorithm is in the scope of the art of the skilled person. 

30 [0025] Furthermore, each synthesis protocol may involve functional "transformers" to be appended to a first BB, prior 
to its coupling to a second BB. Thus, the chemical reactions hereafter considered are: 

- either direct coupling processes between building blocks, such as for instance, with Rt-COOH as a first BB, and 
H 2 N-R2 as a second BB: 

35 

R1-COOH + H 2 N-R2 R1-CO-HN-R2 + HOH 

- or coupling with a functional transformation of the first BB prior to the coupling to the second BB, such as in: 

40 R1-NH 2 + Im-CO-lm R1-NH-CO-lm + Him (lm=imidazole) R1-NH-CO-lm + H 2 N-R2 -» RVNH-CO-NH-R2 + Him 

[0026] Transformers (the carbonyl group -C{=0)- in the above example) replace the original reactivity by a new one. 

opening the possibility to use the modified BBs in synthesis that are not feasible with the original ones. 

[0027] A first algorithmic filter, referred to as the chemical filter, is implemented in order to check whether each BB 

45 qualifies for a given reaction, according to reactivity specifications listed in the corresponding synthesis protocol. 

[0028] The chemical filter is used to select two subsets of BBs of type A and respectively B, which are considered to 
be valid reaction partners, to yield products of the type A-T-B, with T being the transformer, if any, required by the con- 
sidered chemical assembly strategy. It involves preliminary steps to ( fclean > > the BBs in removing accompanying coun- 
terions and cutting avay the leaving groups to form radicals in which the reactive center identified as such. 

so [0029] An example of a chemical fitter I is illustrated in figure 1. It comprises steps 1 to 3 as preliminary steps and 
steps 4 and 5 as conformational steps, as follows. 

Step 1 : scanning the so-called "reactivity profile" of the current synthesis, i.e. required groups specifying, in con- 
sideration of their reactivity in view of their bonding ability with partners to make the synthesis possible, of interfer- 
55 ring groups and transformers 

the type of the reactive center or the degree of substitution of nucleophilic centers (e.g. primary and secondary 
monoaromatic amines) out of lists of several choices, 
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thresholds values of number of rotatable bonds {flexibility threshold) and of the molecular mass, 
and then selecting the BB files. 

Step 2 : checking if the selected BB contains a single molecule and deleting the counterions defined as maximal 
size of connex subgraphs (such as R-NH 3 VCI") if any; 

Step 3 : checking the presence of any interf erring groups in the selected BB that should trigger secondary proc- 
esses, and discarding such current BB; 

Step 4 : checking the presence of the required functional groups in the BB. and figuring out which atom, called 
hereafter the reactive center, is involved in forming a bond with the second BB. A part of the B8. called the leaving 
group", is eliminated during the reaction to prepare the reactive center to be linked, the step detecting and deleting 
the corresponding fragment. 

For example: in a carboxylic acid in an amidificaiton process, the leaving group -OH is deleted and the reactive 
center is set at the carboxyl carbon: 

R-C(=0)-OH -► R-C*(=0)-, 

'labels the reactive center. • 

If the BB contains several potentially reactive groups, then all the possible reactive centers are enumerated and. 
a selection of the correct one is conducted with auxiliary software or routine rules implemented by the skilled per- 
son, or the compound is discarded: if no reactive group is found, the compound is also discarded. 
Step 5 : if a functional transformation is specified due to the fact that another reactive center is considered to be 
more tunable to the synthesis, the transformer fragment is attached to the previously detected reactive center and 
a new reactive center is located at the atom of the transformer fragment that will form a bond with the BB, as. for 
example, with a reaction of the type: 

R-HN # -+-C(»0)* -► R-HN-C"(=0)- 



Thus, the chemical filter transforms the raw structures of the BBs into corresponding fragments as they appear 
in a final product, in detecting the reactive centers, deleting the leaving groups and coupling to a transformer moiety 
when required. These fragments are referred to as "radicals" which are liable to be reaction partners referred to as 
Si and S2, one of the valencies of the reactive center is labeled as the free valency, to be used for coupling with 
the partner radicals in order to obtain the final products. 

In the following rule-based algorithm II. a geometrical filter as illustrated in figure 2. the construction of 3D- 
geometries from conformational sampling of selected radicals, are carried out such as to ensure that, in resulting 
geometries, the reactive centers of the radicals are sterically accessible, e.g. the free valency points towards a 
region of free space, in order to ensure that any radical can be concatenated with partner radicals without any 
clashes between the linked moieties which form the final product. 

In order to achieve this, a bulky "spacekeeper" group linked to the selected radicals is used in connection with 
the conformational sampling. In the present embodiment (figure 2), the following steps are carried out. 
Step 6: hydrogens being added to the heavy atom skeleton of the sketches of each radical, 6 pharmacophore fea- 
tures: aromaticity. hydrophobicity, hydrogen bond donor or acceptor property, positive and negative charge of every 
atom, are checked and listed, considering the specific chemical groups involved and more particularly that 

aliphatic amino groups are under cationic form, while aromatic amines are taken as neutral; 
a special flag is used to signal the presence of imidazole rings, which may appear under physiological condi- 
tions, at neutral pH, under both protonated or unprotonated form; 
carboxytate, sulphonate, phosphate groups and tetrazoles are considered to be anions. 

Step 7: the 2D-sketch of the radical is anchored with its free valency to the following bulky spacekeeper group, the 
tris(triiodosilyl)methyl- entity: 

R-HN-C*(=0)-+-C(Sil 3 ) 3 -► R-HN-C(=0)-C(Sil 3 ) 3 

Step 8: the resulting 2D-sketch of this compound is submitted to a conformational sampling run. performed for 
instance by the "Catalyst" MSI software, yielding a collection of possible confer mers of this compound. 
Step 9: for each of the conformers obtained at step 8. the spacekeeper moiety is now severed and deleted, restor- 
ing the free valency of the radical which now points towards the empty region previously occupied by the space- 
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keeper. 

Advantageously, steric hindrance criteria are then performed in order to check whether the spacekeeper strat- 
egy has managed to insure an appropriate accessibility for the reactive center for every conformation; the criteria 
are for instance: 

5 

a comparison of the interatomic distances in the different conformations in order to make sure that the retained 
conformers are not redundant; 
- an energy criterion discarding higher-energy conformers found to have an almost identical geometry with 
respect to lower-energy conformations. 

w 

At step 9. for each retained conformer, a coordinate transformation in a 3-DimensionaJ OXYZ reference system 
is performed, in order to place the reactive center in the origin of the reference system and to align the free valency 
along the Z-axis. The coordinates of the conformers are stored. 

Step 10: a list of ail the potential compounds that are obtained by first coupling each radical of first sets S1 with 
is every one of its partners S2 is then generated. 

[0030] This list serves as input for a "combinatorial explosion" ghost database of fingerprints and is submitted as a 
query to such a combinatorial ghost database which rapidly generates the structures of the herein enumerated dimers 
by linking every conformer of the BBs of a first type to every conformer of the BBs of a second type, performing a 2 or 
so 3-step torsional angle driving around the linkage bond, evaluating the distances within each of the so generated con- 
formers. 

[0031 ] The minimal distance between any two heavy atoms belonging to the initial radicals and separated by more 
than three bonds in the final molecule is evaluated and is checked if it is above a predetermined threshold. If tu=his is 
not the case, that particular conformer of the dimer is discarded. 
25 [0032] A distance distribution density of pairs of pharmacophore features and interatomic distances is constructed. 
The matching to a given pair of pharmacophore features and the reaching to interatomic distances in a given range 
value constitute conformational fingerprint vectors, each fingerprint element of them being the number of combinations 
that can be obtained with the pharmacophore features. 

[0033] A molecular fingerprint vector is then obtained by summation of the conformational fingerprints, followed by 
30 norming this sum with respect to the numbers of considered conformers of the product to provide an average molecular 
fingerprint. 

[0034] In the present embodiment, a combinatorial amplifier algorithm III constituting a < (ghost) ) database {figure 2) 
recites this combinatorial explosion by carrying out the following steps: 

35 - in step 11, looping over all pairs listed at step 10, current pairs of radicals, the first from S1 , the second from S2. 
are stored together with all their available conformers C, as obtained at step 9 and the lists of pharmacophore fea- 
tures of the composing atoms formerly used. The pharmacophore features of the atoms that are linked by new 
bonds are reevaluated, since their chemical type have changed due to this chemical transformation; 
in step 12, each conformer C1 of the first radical S1 is linked with each conformer C2 of the second radical S2. in 

40 mirroring the coordinates of the latter with respect to the XOY plane, and translating them along the Z axis in order 
to restore the correct length of the newly formed bond between C1 and C2; a 2 or 3-step rotation around the new 
axis is performed. 

Therefore, the generated number of conformers of the coupling product equal the number of conformers of the 
first times that of the second radical, times the number of rotations around the newly formed bond; 
45 - in step 13, in order to construct the distance distribution density of pairs of pharmacophore features, a complete 
set of interatome distances is evaluated for the current conformation of each dimer; all the distances between pairs 
of atoms matching a given pair of pharmacophore features and taking a value between n and n+1 Angstroms, n 
being an integral number, are counted; 

in step 14. a generated conformational fingerprint vector describing the current geometry is represented, in the 
so present embodiment, with a (6x7)/2x1 2=252-element vector, where (6x7)/2=21 is the number of combinations that 
can be obtained with the 6 pharmacophore features f 1 to f6 introduced supra (aromatic-aromatic, aromatic-hydro- 
phobic anion-anion). 

[0035] Each fingerprint element FP (fa. fb. i). fa and fb being one of the group f 1 to f6, is equal to the number of atoms 
55 pairs matching a given pair of pharmacophore features (fa.fb) and which are separated by a distance between i+3 and 
i+4 Angstroms, i being an integral number. For example, the element of the fingerprint FP(cation,aromatic, 1) counts the 
number of atoms pairs in which one is a cation, the other is an aromatic atom and the distance between them falls within 
the range 4 to 5 Angstroms. 
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(0036J A molecular fingerprint vector is obtained by summation of the confer mationaJ fingerprints, followed by naming 
this sum with respect to the numbers of considered confer mers of the product This fingerprint vector is stored in asso- 
ciation with the identification codes of the two radicals which compose that product. The collection of all the fingerprints 
of all the possible coupling products between all BBs qualifies the initial library synthesis processes and defines a 
5 potential library 

[0037] A comparison algorithmic filter IV is then performed to compare the fingerprints of the reference compounds 
with each fingerprint of the potential library by encoding of leads and known ligand structures, by comparing the dis- 
tance distributions corresponding to each pair of features with partial scores similarity score per feature pair and a sim- 
ilarity score. 

w [0038] Weighting factors are introduced to represent the relative importance of the different pharmacophoric features, 
and are the tunable parameters of the method. Indeed, the different physical, chemical or biological properties are more 
sensitive to the presence of specific bipolar pharmacophores than others : the similarity of two compounds with respect 
to given feature pairs is more important, while the that the fact that the two compounds differ with respect to other fea- 
ture pairs. To reflect the relative importance of the feature pairs, the weighting factors wieights the the partial scores in 

j 5 the calculation of the overall similarity score. 

[0039] In the present embodiment, the comparison algorithm IV (figure 2) consists of : 

encoding, in step 1 5, of leads or known ligand structures, under the form of fingerprints, following the same proce- 
dure as hereto efore described, except for the fact that the used conformers are those directly generated by the run 
so "Catalyst" soft of MSI algorithm, to which the sketches of these reference compounds are submitted as such; 

introducing a scoring function, in step 16, which successively compares the distance distributions corresponding to 
each pair of features (fa. fb). 

[0040] First, the 21 partial scores expressed as pnorml (fa,fb), pnorm2(fa.fb) and pcross(fa,fb) l are calculated in the 
25 form of convolution products for every pair of features, as follows 



pnorml(fa.fb) =* £ FP_mol1(fa.fb.i)*FP_mol1(fa,fb,jy 

i.j-l....l2 

30 

pnorm2(fa,fb) = £ FP_mol2(fa.fb.i)•FP_mol2(fa.fb,j) # e" ^i " i) * 0 ' ^, 

i.j=1....12 

35 

pcross(fa,fb) = £ FPjnoM (fa,fo.0*FP_mol2(fa,fb ( j)*e " (HnH) 

U-1....12 



40 where FP_mol1 and FP_mol2 are the fingerprints of the first compound, a reference one. and. respectively, the second 
compound, a tested one. i and j are variables looping over all the considered distance bins, as described at step X, and 
a an exponential damping factor. 

[0041 ] If pnorml (fa.fb) and pnorm2(fa.fb) are simultaneously zero, it means that the corresponding pairs of features 
do not occur in any one of the molecules; therefore, such combinations are ignored when evaluating the global similarity 
45 score between moJl and mol2. Otherwise, the partial similarity score per feature pair, psim(fa.rb) is defined by 

psim(fa ( fb) = 2pcross(fa,fb)/[pnorml(fa.fb)+pnorm2(fa,fb)] 

so [0042] And the similarity score by a sim-score which involves a weighting factor: 

sim-score a l-{ I W(fa)W(fb)psim(fa,(b)]/I I W(fa)W{fb)] 

55 where W(f) is the weighting factor for the feature f. 

[0043] In the sim-score expression, both sums are taken over the feature (fa.fb) for each at least one of pnorm 1 (fa.fb) 

and pnorm2(fa,fb) are not zero, e.g. the pairs that appear in at least one of the two molecules. 

[0044] The weighting factors W(f) are the tunable parameters of the method together with the exponential damping 
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factor that controls the value of prior ml , pnorm2. pcross. The values of such tunable parameters are obtained by differ- 
ent calibration approaches, aimed to optimize the overall performance of the model. 

[0045] The number of fittable factors can be reduced to restrain the number of degrees of freedom of the 21 dimen- 
sional space, n the case of six pharmacophore features. In this case, the simplification recites: 

W(fa.fb) = W(fe) . W(fb) 



[0046] Then, the weighting factors assocated to the bipolar pharmacophores are written as a product of thevweighting 
factors : the search for the 21 optimal factors values is reduced to asearch of only 6 values. 

- listing, in step 17, all the molecules mol2 of the potential library for which their similarity score, sim-score, with 
respect to the reference compound moll is less than a specified threshold with a sorting function in respect of the 
similarity score value. Synthesis of the retrieved structures or the best of them are performed from the potential 
library, and subject to biological testing. Alternatively or cumulatively, a list of all the building blocks represented in 
the retrieved products is established and a generation focussed combinatorial library is based on such BBs. 

[0047] Two approaches are possible according to the specific search to be carried out. the General Diversity paradigm 
and the Receptor-Oriented paradigm.. 

[0048] The General Diversity paradigm consists in choosing the weighting factors in order to obtain a similarity scoring 
function which successively discriminates between classes of ligands of different receptors. Given an arbitrary collec- 
tion of clusters of ligands associated to different receptors, eah cluster consisting of families of ligands that exclusively 
bind to the associated receptor, the < frnost diverse)) subset of ligands retrieves one ligand per receptor if the scoring 
function on which this most diverse subset selection has been realized has an ideal disriminative power. Because a less 
disrtminating function would selet several ligands of the same receptor, while completely ignoring other ligand families, 
the weighting are optimized in order to improve the discriminative power of the distances between two molecules by 
using as an objective function the number of receptors for which at least one ligand has been piked out in the most 
diverse selection. Then the obtained weighting factors values characterize, if the number of receptors is sufficiently 
large, the aerage propensities of the bipolar pharmacophores to contribute to the anchoring of a ligand in a receptor 
site. 

[0049] As to illustrate this paradigm, six reference ligands of the DAT receptors (dopamine carrier / 1 DM) have been 
used to carry out a first modelisation. Compounds of different chemistries, chemistry of functionnel rearrangements, 
reducive amination, amides, urrea. carbamates and esters, have been sorted according to the filters and the general 
diversity paradigm according to the present method. 42 compounds have been screened and synthesized providing 
from a 21 aldehydes vector and a 2 acids vector. Among them, 14 compounds have an inhibition bond (which measure 
the stability of the bond) beyond 50%. with 4 beyond 70% and one above 90% (93%). 

[0050] The Receptor-Oriented Diersity paradigm consisting in calibrating the weighting factors on the basis of primary 
screening results of a library against a given receptor, such as to minimize the average distance between any two active 
compounds and to maximize the dissimilarity scores between each active and any inactive compound. This calibration 
mode allows to define which pharmacophores are essential for the binding to a given receptor. 
[0051] According to an example, analogs to the serotomine activity on the 5-HT 4 receptor (ability of compounds to 
inhibit 5-HT 4 induced contractions of guinea-pig ileum preparations) have been modelised according the receptor-ori- 
ented divresity score function of the invention. 60 cabamates, 35 esters , 4 ureas, and one amide have been retrieved. 
The activity (inhibition above 50%) of 14 compounds have been coftrmed. 

[0052] The invention is not limited to the examples as described and illustrated. In particular, different reference librar- 
ies corresponding to different chemistries can supply the reference library of the Bos. In other respects, the present 
invention can be applied to diffent cent exes, for instance to search analogs to a given product having specified chemical 
properties (a detergent. ...). 

Claims 

1. Method to rapidly retrieve potentially active analogs of lead compounds, wherein 3D fingerprints are generated 
and screened from a large database of 3D multiconformational fingerprints of chemically feasible combinatorial 
products as follows: 

- selecting systematically by a chemical filter algorithm, according to elementary chemical rules, couples of rad- 
icals each including a reactive center, as valid reaction partners, from Building Blocks (BBs) constituents of a 
reference library, the reactive centers being detected by analysing the molecular connectivity and the leaving 
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groupsbeing deleted; 

- modeling the radicals by linking them temporarily to a bulky spacekeeper group and submitting the resulting 
complex to conformational sampling run yielding a collection of conformers, then removing the spacekeeper in 
order to ensure that the reactive centers of the radicals a/e stericaJly accessible and its free valency points 
towards a region of free space previously occupied by the spacekeeper; 

- registering such 3D models of the radicals in a combinatorial ghost database, after verifying that these models 
comply with the coupling ability according to sterical hindrance and conformational diversity criteria of the 
ghost database, this ghost database emulating a database of combinatorial products in instantly generating 
the structure of any such product by linking together the registered structures of the constituting radicals; 

- for any molecular structure that is accessible within the ghost database, detecting any atom that displays phys- 
ical property features of the pharmacophoric type, involved in determining the intensity of intermolecular inter- 
actions, at least the hydrophobic, aromatic, hydrogen bond donor, hydrogen bond acceptor, anionic or cationic 
characters, on the basis of elementary rules accounting for the chemical nature of such an atom and the 
molecular environment in which it is placed, and listing all the possible bipolar pharmacophores pairs (BP) in 
which each atom displays a given pharmacophoric feature; 

- tor the pharmacophores (BPs) detected in each molecular structure, calculating all the distances between the 
involved atoms in every conformation of this molecule and creating a BP distance distribution density, by fuzzy 
monitoring of the number of BPs for which the associated distances fall within given ranges; 

- generating a conformational fingerprint vector that contains all the distance distribution densities of the phar- 
macophores (BPs) associated to a current conformation, obtaining an average molecular fingerprint from the 
conformational fingerprints of the considered conformers to constitute and supply a potential library, 

- defining a scoring function for each molecular fingerprint accounting for the relative importance of the pharma- 
cophoric features owings to weighting factors that are calibrated in order to maximize a discriminative power 
with respect to different binding affinities; and 

- generating the fingerprints of the lead for which analogs are to be retrieved, comparing these fingerprints to 
each fingerprint of the potential library according to the above scoring function as maximized tor the lead, and 
retrieving the molecules of the potential library lor which the scoring function gives score values less than a 
specified threshold. 

2. Method according to daim 1 , wherein the weighting factors of the scoring function are calibrated in order to max- 
imize the discriminative power between families of ligands of different receptors in a ( (General Diversity paradigm ) > 
by using as an objective function the number of receptors for which at least one ligand has been piked out in the 
most diverse selection. 

3. Method according to claim 1 . wherein the weighting factors of the scoring function are calibrated in order to max- 
imize the discriminative power between the compounds that bind to a given receptor, in contrast to those that have 
no binding affinity with respect to it. in a ( (Receptor-Oriented Diversity paradigm ) ) by primary screening results of 
a library against a given receptor, such as to minimize the average distance between any two active compounds 
and to maximize the dissimilarity scores between each active and any inactive compound. 

4. Method according to anyone of the preceeding claims, characterised in that preliminary checking steps in order 
to discard current building blocks which could include potentially interfering reactive groups and/or to add functional 
groups to a building block prior coupling are included. 

5. Method according to anyone of the preceeding claims, characterised in that the yielding of conformers 3D is 
completed with an encoding structural information step over a geometrical filter, by extracting from a conformational 
fast sampling analysis of the molecular structures of 2D-sketches.of the selected radicals in connection with phar- 
macophoric features and with respect to each specific chemical group involved in each radical to detect and discard 
remaining conformations of important steric hindrance around the reactive center and conformations similar to 
other sampled ones. 

6. Method according to anyone of the preceeding claims, characterised in that the building blocks currently availa- 
ble from a reference library and a set of synthesis protocols are regularly updated, each such update automatically 
triggering the update of the potential libraries. 

7. Method according to claim 1. characterised in that it is integrated to a discovery paradigm defining a feedback 
loop starting from a primary, "blind" screening of a compound library, and including the steps of identification of 
active compounds, training and adjusting parameters of a predictive model to recognize the specific features of 
active compounds, retrieval of potentially active analogs for these compounds out of a reference library according 
to the instant method, synthesis of these analogs and screening to refine the parameters. 

8. Method according to anyone of the preceeding claims, characterised in that it comprises essentially the following 
steps: 
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- scanning a reactivity profile of the current synthesis consisting of required groups and interferring groups (step 

- checking if the selected BB contains a single molecule and deleting the counterions (step 2); 

• checking the presence of any interferring groups in the selected BB and discarding such current BB (step 3); 

- checking the presence of required functional groups in the BB and figuring out the reactive center involved in 
the bond with a second BB to form reactive partner radicals (S1 ,S2). eliminating the leaving groups (step 4): 

- adding hydrogens to the heavy atom skeleton of the sketches of each radical (S1 ,S2), at least six pharmacoph- 
ore features, aromaticity, hydrophobicity. hydrogen bond donor or acceptor property, positive and negative 
charge of every atom, being checked and listed (step 6); 

anchoring the 2D -sketch of the radical with its free valency to the following bulky spacekeeper group (step 7); 

- submitting the resulting 2D-sketch of this compound to known conformational sampling run, yielding a collec- 
tion of conformers (C) (step 8); 

- severing and deleting, for each of the conformers obtained (step 8). the spacekeeper moiety, restoring the free 
valency of the radical which now points towards the empty region previously occupied by the spacekeeper 
(step 9); ' 

- performing, for each retained conform er (C). a coordinate transformation in a reference system (OXYZ). in 
order to place the reactive center in the origin of the reference system and to align the free valency along the 
Z-axis (step 9); 

• generating a list of all the compounds that are obtained by coupling each radicaJ of first sets (S1) with every 
one of its partners (S2) (step 10); 

- looping over all pairs listed (step 10) and storing current pairs of radicals (S1 . S2) together with ail their avail- 
able conformers (C). as obtained (step 9) and the lists of pharmaoophoric features of the composing atoms for- 
merly used, the pharmacophore features of the atoms linked by new bonds being reevaluated (step 1 1) 

• linking each conformer (C1) of the first radical (S1) with each conformer (C2) of the second radical (S2) in 
minoring the coordinates of the latter with respect to the XOY plane, translating them along an axis (2) in order 
to restore the correct length of the newly formed bond between the conformers (C1 , C2); and rotating around 
the new axis (step 12); 

in order to construct the distance distribution density of pairs of pharmacophoric features, evaluating a com- 
plete set of interatomic distances for the current conformation of each dimer; all the distances between pairs 
of atoms which match a given pair of pharmacophoric features and taking a value between n and n+1 Ang- 
stroms being counted (step 13); 

representing a generated conformational fingerprint vector describing the current geometry with a number of 
elements with respect to the number of combinations that can be obtained with the pharmacophoric features, 
each fingerprint element (FP) being equal to the number of atoms pairs matching a given pair of pharmacoph- 
oric features (fa.fb) and which are separated by a distance between i+3 and i+4 Angstroms (step 1 4); 
summing the conformational fingerprints to obtain a molecular fingerprint vector, followed by norming this sum 
with respect to the numbers of considered conformers of the product, each fingerprint vector being stored in 
association with the identification codes of the two radicals which compose that product; the collection of all the 
fingerprints of all the possible coupling products between all BBs qualifying the initial synthesis processes and 
defining potential libraries, (step 14); 

encoding of leads or known ligand structures, under the form of fingerprints, following the same procedure 
directly generated by a run conformational algorithm, to which the sketches of these reference compounds are 
submitted as such (step 15); 

introducing a scoring function, which successively compares the distance distributions corresponding to each 
pair of features (fa. fb). (step 16) and 

listing all the molecules (mol2) of the potential library for which their similarity score (sim-score), with respect 
to the reference compound (mod) is less than a specified threshold owings to a sorting function in respect of 
the similarity score value (step 1 7). 

9. Method according to claim 4, characterised in that specific chemical groups involved are 

aliphatic amino groups are under cationic form, while aromatic amines are taken as neutral; 
a special flag is used to signal the presence of imidazole rings, which may appear under physiological condi- 
tions, at neutral pH, under both protonated or unprotonated form; 

• carboxylate. sulphonate, phosphate groups and tetrazoles are considered to be anions. 

10. Method according to claim 5, characterised in that steric hindrance criteria are then performed in order to check 
whether the spacekeeper strategy has managed to insure an appropriate accessibility for the reactive center for 
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every conformation; the criteria are for instance: 



a comparison of the interatomic distances in the different conformations in order to make sure that the retained 
conformers are not redundant; 

an energy criterion discarding higher-energy conformers found to have an almost identical geometry with 
respect to lower-energy conformations. 

11. Method according to claim 8. characterised in that each conformer (C1) of the first radical (S1) is linked with 
each conformer (C2) of the second radical (S2), in minoring the coordinates of the latter with respect to a plane 
(XOY) of the reference system, translating them along the axis (2) perpendicular to that plane (XOY) in order to 
restore the correct length of the newly formed bond between the conformers ( C1. C2); and rotating them around 
the new axis. 

1 3. Method according to anyone of the preceeding claims, characterised in that the reactivity profile includes trans- 
formers to be coupled to the reactive center of a first BB prior to its coupling to the second BB, each synthesis pro- 
tocol involving functional transformers to be appended to the first BB. 

14. Method according to anyone of the preceeding claims, characterised in that the bulky spacekeeper group is a 
tris(triiodosilyl) methyl-entity. 

15. Method according to anyone of the preceeding claims, characterised in that, the number of pharmacophore 
features being six, a generated conformational fingerprint vector describing the current geometry is represented,, 
with a 252-element vector.in respect of the number of combinations obtained with six features. 

1 6. Method according to anyone of the preceeding claims, characterised in that synthesis of the retrieved structures 
are performed from the potential library, and subject to biological testing, and that a list of all the BBs represented 
in the retrieved products is established and a generation focussed combinatorial library is based on such BBs. 
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